Classification revisited: a web of knowledge
نویسنده
چکیده
The vision of the Semantic Web (SW) is gradually unfolding and taking shape through a web of linked data, a part of which is built by capturing semantics stored in existing knowledge organization systems (KOS), subject metadata and resource metadata. The content of vast bibliographic collections is currently categorized by some widely used bibliographic classification and we may soon see them being mined for information and linked in a meaningful way across the Web. Bibliographic classifications are designed for knowledge mediation which offers both a rich terminology and different ways in which concepts can be categorized and related to each other in the universe of knowledge. From 1990-2010 they have been used in various resource discovery services on the Web and continue to be used to support information integration in a number of international digital library projects. In this chapter we will revisit some of the ways in which universal classifications, as language independent concept schemes, can assist humans and computers in structuring and presenting information and formulating queries. Most importantly, we highlight issues important to understanding bibliographic classifications, both in terms of their unused potential and technical limitations. 1. Background Classifications are created using the intellectual power of ontological observation and analysis and reflect whichever approach one decides to take in defining and grouping things. Whatever the motive for their creation, classifications are logically and semantically organized schemes of concepts representing some kind of reality and as such are essential and versatile tools for the representation and visualisation of a knowledge space. A classification, by nature, cannot serve all purposes, however, if it is universal with respect to knowledge coverage and structured in a comprehensive, detailed, highly elaborate and logical way, it is likely to be more versatile than a scheme created for one single subject, system or task. Bibliographic or documentary classifications are a special kind of knowledge classification designed for the mediation of knowledge. Their unique feature is that they are not concerned with objects or entities, as other knowledge classifications, but rather with subjects, i.e. the ways in which entities are described in documents. Bibliographic classifications will systematise phenomena and topics that can be studied in relation to these phenomena, but will also provide the vocabulary necessary to denote types of knowledge presentation, points of view, the targeted audience or form of document. As subject is a complex construct which ought to be described with a series of concepts in various relationships, classification can also provide rules on how to express complex interactions between phenomena of knowledge as they appear in documents. The oldest and the best known type of bibliographic classifications are library classifications which are created primarily for the physical arrangement of library shelves. The fact that a classification is created for shelf arrangement, and not for detailed indexing for the purposes of metadata-based information retrieval, may influence its structure and its vocabulary as will be explained in the following sections. But whatever their original purpose, general bibliographic classifications are known to be complex systems. Although there are many classifications, the three most internationally used general bibliographic classifications are Dewey Decimal Classification (DDC), Universal Decimal Classification (UDC) and Library of Congress Classification (LCC). They are used in the greatest number of countries and bibliographic collections and are considered de facto standards in information exchange. The main reason that these classifications prevail is a complex set of circumstances primarily related to the power of their ownership, services based on them and their continuous maintenance and development. DDC, although originally created for the shelf arrangement of a single college library, started to be widely used in Anglo-American libraries from 19th century onwards, as the best classification for shelf arrangement at the time. Its use throughout and beyond 20th century was promoted by the OCLC bibliographic service. UDC, originally created as an indexing language for an international universal bibliography project in 1895, gained widespread use through the international presence of its previous owner, the International Federation of Information (FID), which supported its development, translation into more than thirty languages and its world-wide distribution. LCC popularity can be attributed to its governmentally controlled and well supported administration, the speed with which works are classified and the availability of LCC classmarks on Library of Congress catalogue cards, rather than its inherent structural quality Pre-print: Aida Slavic (2011) Classification revisited: a web of knowledge. In: Innovations in information retrieval: perspectives for theory and practice. Eds. Allen Foster and Pauline Rafferty. London: Facet, pp. 23-48. (Marcella & Newton, 1995: 60). Circumstances in the bibliographic domain are such that Colon Classification (CC), which is certainly the most theoretically praised library classification, is barely used, while Bliss Bibliographic Classification (BC2), which is supposed to demonstrate an excellence in classification design, is only 60% complete and is also barely used. There is also a number of national general classification systems but their use is confined to a single country or language which limits their role in the context of global information integration. Intellectual indexing and classification requires a high level of expertise and is an expensive and time consuming process which may not be suited to, or required for, all information retrieval scenarios. For most text-retrieval tasks, various models of automatic text processing and advanced techniques such as statistical models, language models or machine learning, will perform well. However, not all documents are available in digital form and not all digital documents are textual, in the same language or the same script. Equally, not all digital collections are available or accessible in an open networked environment and will not lend themselves easily to processing by otherwise successful data and knowledge mining methods. The task of merging and integrating information contained in legacy collections into, what may be, a web of knowledge, is still ahead of us. National bibliographies of most countries continuously collect, describe and classify everything that is published in their respective countries, by their citizens or in their official languages. This practice has existed for over a century, in some instances even longer, and has led to the creation of large bibliographic collections, only part of which may be available in digital form. Although this may hardly be comparable with the scale of information available on the Web, we are still talking about hundreds and hundreds of millions of documents published in numerous languages all around the world from the beginning of literacy to date. We can assume that in the foreseeable future libraries will continue to classify books for the purpose of collection management. and we know that legacy bibliographic data is something we will try to preserve. The fact that document collections world-wide are classified using bibliographic classification systems may play a significant role in enabling subject searching across these collections. A significant increase in the use of classification in various information integration and discovery services from 1990-2010 clearly indicated such a trend. In the following section we will highlight some shared structural features of bibliographic classifications that are of particular importance for their use in information retrieval and discovery. 2. Classification, how does it work? In Figure 1 below, we see two ways of representing a knowledge field for the purpose of knowledge browsing: one system using natural language terms and the other employing a systematic or classificatory arrangement Figure 1: Presenting knowledge for browsing If we use words for indexing the only way we can mechanically display the subject index of a collection is alphabetically. Two concepts will find themselves in close proximity not because of their semantic similarity but rather owing to the accidence of their names. Classification on the other hand groups concepts semantically, Pre-print: Aida Slavic (2011) Classification revisited: a web of knowledge. In: Innovations in information retrieval: perspectives for theory and practice. Eds. Allen Foster and Pauline Rafferty. London: Facet, pp. 23-48. according to the closeness of their meaning. But in order to 'fix' this systematic organization, classification needs a notational device: a numeric, alphabetic or alphanumerical symbol that represent the class arrangement and supports its mechanical manipulation. Notation is the shortest possible way of expressing sometimes a very complex subject and is very useful in labelling physical documents or metadata for the purpose of systematic arrangement. In Figure 1 we see an example of a numerical, decimal (fractional) notation, where each digit represents a decimal level that corresponds to the level of subdivision in which the dot after every third digit is inserted only for the convenience of reading. We call this type of notation hierarchically expressive: the longer the notation the more specific the class it represents, and by removing the last digit we automatically broaden the class. Not all notations are decimal or hierarchically expressive; they can be a simple ordering device as is the case with LCC and BC2: Q Science QD1-999 Chemistry QD241-441 Organic chemistry Example from LCC AZ Science C Chemistry CO Organic Examples from BC2 Notations (classmarks) represent the class meaning in a universal way no matter how many terms we use to describe it, or in which language these terms may be. For example, if we use 536 to represent Heat (Thermodynamics) in the bibliographic services of China, Russia or the United Kingdom we will be able to search for documents on this topic, irrespective of the language or script in which they are published:
منابع مشابه
Subgeneric classification of Linaria (Plantaginaceae; Antirrhineae): molecular phylogeny and morphology revisited
Linaria Mill. (Plantaginaceae) with about 160 spp. is the largest genus of the tribe Antirrhineae. We conducted phylogenetic analyses of nuclear ribosomal DNA internal transcribed spacer region (ITS) and chloroplast DNA (rpl32-trnL) sequence data to test the monophyly of currently recognized sections in Linaria. For this purpose 86 species representing seven sections of Linaria and one species ...
متن کاملIdentification of Fraud in Banking Data and Financial Institutions Using Classification Algorithms
In recent years, due to the expansion of financial institutions,as well as the popularity of the World Wide Weband e-commerce, a significant increase in the volume offinancial transactions observed. In addition to the increasein turnover, a huge increase in the number of fraud by user’sabnormality is resulting in billions of dollars in lossesover the world. T...
متن کاملIdentification of Fraud in Banking Data and Financial Institutions Using Classification Algorithms
In recent years, due to the expansion of financial institutions,as well as the popularity of the World Wide Weband e-commerce, a significant increase in the volume offinancial transactions observed. In addition to the increasein turnover, a huge increase in the number of fraud by user’sabnormality is resulting in billions of dollars in lossesover the world. T...
متن کاملA Novel Approach to Feature Selection Using PageRank algorithm for Web Page Classification
In this paper, a novel filter-based approach is proposed using the PageRank algorithm to select the optimal subset of features as well as to compute their weights for web page classification. To evaluate the proposed approach multiple experiments are performed using accuracy score as the main criterion on four different datasets, namely WebKB, Reuters-R8, Reuters-R52, and 20NewsGroups. By analy...
متن کاملDDH Epidemiology Revisited: Do We Need New Strategies?
Background: Although the developmental dysplasia of the hip (DDH) is well known to pediatric orthopedists, its etiology has still remained unknown and despite dedication of a vast majority of research, the results are still inadequate and confusing. The exact incidence of DDH and its relationship with known risk factors in Iran is still unknown. Here we represent the results of one year study o...
متن کاملThe Effect of Web-Based and Traditional Instructions on Nurses' Knowledge about AIDS
Introduction:. Knowing about web-based education outcomes compared to traditional method can help instructors to use more effective methods for future continuing education Programs. The aim of this study was to compare the effect of web-based and traditional teaching methods on nurses' knowledge about Bird Flu. Methods: In this quasi-experimental study with two groups, pretest post-test desi...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1705.07058 شماره
صفحات -
تاریخ انتشار 2017